Differential Effects of the Minimum Point Threshold on Topology and Motion in DBSCAN Clustering

Abstract

We present a rigorous decomposition of how the DBSCAN clustering algorithm's behavior changes as the minimum points parameter (minPts) increases while holding the neighborhood radius $\varepsilon$ fixed. Through two main theorems, we establish: (1) an exact matching-based decomposition of the change in total inter-cluster center separation into motion and topological components, and (2) a precise formula for centroid shifts under point removal ("peeling"). These results provide a mathematical framework for understanding the dual mechanisms—boundary erosion and cluster birth/death events—that govern DBSCAN's sensitivity to parameter choices.

Fix $\varepsilon>0$ and a dataset $X=\{x_1,\dots,x_n\}\subset\mathbb{R}^3$. For each integer $m\in\{1,\dots,n\}$, run DBSCAN with parameters $(\varepsilon,\text{minPts}=m)$. Let the non-noise clusters be

$$ \mathcal{C}(m)=\{C_1^{(m)},\dots,C_{k(m)}^{(m)}\}, $$

and define each cluster center as the mean

$$ \mu_i^{(m)}:=\frac{1}{|C_i^{(m)}|}\sum_{x\in C_i^{(m)}} x\in\mathbb{R}^3. $$

Define the total inter-center separation

$$ S(m):=\sum_{1\le i Because $m$ is discrete, the natural "derivative" is the finite difference

$$ \Delta S(m):=S(m+1)-S(m). $$

Intuition to hold onto: Increasing minPts makes "core-ness" stricter. At fixed $\varepsilon$, DBSCAN tends to peel low-density fringes/bridges (moving centers) and may also split components or kill small clusters (changing which centers even exist). The two theorems below isolate these effects cleanly.

Theorem 1: Matching Decomposition of $\Delta S(m)$ into Motion vs. Topology

THEOREM 1

Statement

Let $I=\{1,\dots,k(m)\}$ index clusters at $m$, and $J=\{1,\dots,k(m+1)\}$ index clusters at $m+1$. Form the cost matrix

$$ c_{ij}:=\big|\mu_i^{(m)}-\mu_j^{(m+1)}\big|\qquad (i\in I,\;j\in J). $$

Let $M\subset I\times J$ be a minimum-cost one-to-one matching between cluster centers at $m$ and at $m+1$ (e.g. the Hungarian algorithm). Define the matched ("survivor") index sets

$$ I_M:=\{i\in I:\exists j,(i,j)\in M\},\qquad J_M:=\{j\in J:\exists i,(i,j)\in M\}, $$

and the unmatched sets

$$ I_D:=I\setminus I_M \quad(\text{deaths}),\qquad J_B:=J\setminus J_M \quad(\text{births}). $$

For each $i\in I_M$, let $\pi(i)\in J_M$ denote its match, and define the center shift

$$ \delta_i := \mu_{\pi(i)}^{(m+1)}-\mu_i^{(m)}. $$

Then $\Delta S(m)$ decomposes exactly as

$$ \Delta S(m) = \underbrace{\Big(S_{MM}^{\text{new}}-S_{MM}^{\text{old}}\Big)}_{\text{matched-center motion}} + \underbrace{\Big(S_{MB}^{\text{new}}+S_{BB}^{\text{new}}-S_{MD}^{\text{old}}-S_{DD}^{\text{old}}\Big)}_{\text{birth/death topology term}} $$

where

$$ S_{MM}^{\text{old}}=\sum_{i<\ell, i,\ell\in I_M}|\mu_i^{(m)}-\mu_\ell^{(m)}|, $$ $$ S_{MM}^{\text{new}}=\sum_{i<\ell, i,\ell\in I_M}|\mu_{\pi(i)}^{(m+1)}-\mu_{\pi(\ell)}^{(m+1)}|, $$ $$ S_{MB}^{\text{new}}=\sum_{j\in J_B}\sum_{i\in I_M}|\mu_j^{(m+1)}-\mu_{\pi(i)}^{(m+1)}|, $$ $$ S_{BB}^{\text{new}}=\sum_{j<\ell, j,\ell\in J_B}|\mu_j^{(m+1)}-\mu_\ell^{(m+1)}|, $$ $$ S_{MD}^{\text{old}}=\sum_{d\in I_D}\sum_{i\in I_M}|\mu_d^{(m)}-\mu_i^{(m)}|, $$ $$ S_{DD}^{\text{old}}=\sum_{d<e, d,e\in I_D}|\mu_d^{(m)}-\mu_e^{(m)}|. $$

Moreover, the motion term is controlled by the shifts:

$$ \big|S_{MM}^{\text{new}}-S_{MM}^{\text{old}}\big| \;\le\; (|I_M|-1)\sum_{i\in I_M}|\delta_i|. $$

Intuition. The matching $M$ is how we formalize "which cluster at $m+1$ corresponds to which cluster at $m$" without handwaving. Once you do that, every change in $S$ comes from only two sources:

Motion: the same clusters' centers moved (peeling, reassignments, shrinkage).
Topology: clusters appeared/disappeared under the one-to-one matching (splits show up as one survivor plus extra births; vanishing shows up as deaths).

The bound says: if centers hardly move, the matched part of $S$ hardly changes—so any big jump must be explained by births/deaths (topology events).

Proof. Partition the old cluster index set as $I=I_M\sqcup I_D$. Then the sum defining $S(m)$ splits into three disjoint parts: pairs entirely within $I_M$, pairs crossing $I_M$ and $I_D$, and pairs entirely within $I_D$. Hence $$ S(m)=S_{MM}^{\text{old}}+S_{MD}^{\text{old}}+S_{DD}^{\text{old}}. $$

Likewise, partition $J=J_M\sqcup J_B$. Using the matching bijection $i\mapsto \pi(i)$ between $I_M$ and $J_M$, the sum defining $S(m+1)$ splits as

$$ S(m+1)=S_{MM}^{\text{new}}+S_{MB}^{\text{new}}+S_{BB}^{\text{new}}. $$

Subtracting yields the displayed exact decomposition.

For the bound, fix matched indices $i,\ell\in I_M$ and write

$$ \mu_{\pi(i)}^{(m+1)}=\mu_i^{(m)}+\delta_i,\qquad \mu_{\pi(\ell)}^{(m+1)}=\mu_\ell^{(m)}+\delta_\ell. $$

Then the norm is 1-Lipschitz:

$$ \Big||(\mu_i-\mu_\ell)+(\delta_i-\delta_\ell)|-|\mu_i-\mu_\ell|\Big| \le |\delta_i-\delta_\ell| \le |\delta_i|+|\delta_\ell|. $$

Summing over all $\binom{|I_M|}{2}$ pairs counts each $|\delta_i|$ exactly $(|I_M|-1)$ times, giving

$$ \big|S_{MM}^{\text{new}}-S_{MM}^{\text{old}}\big|\le (|I_M|-1)\sum_{i\in I_M}|\delta_i|. $$ □

Theorem 2: Exact Centroid Shift Under "Peeling" When minPts Increases

THEOREM 2

Statement

Consider a matched cluster $i\in I_M$. Suppose that raising minPts from $m$ to $m+1$ acts on this cluster by removing a subset of its points (turning them into noise or assigning them elsewhere) while keeping the rest, i.e.

$$ C_i^{(m)} = A_i \sqcup R_i,\qquad C_{\pi(i)}^{(m+1)} = A_i, $$

where $A_i$ are the kept points and $R_i$ are the removed ("peeled") points, disjoint.

Let $\mu_{A_i}$ and $\mu_{R_i}$ denote the means of $A_i$ and $R_i$. Then the center shift $\delta_i=\mu_{\pi(i)}^{(m+1)}-\mu_i^{(m)}$ satisfies the exact identity

$$ \delta_i = \frac{|R_i|}{|A_i|}\big(\mu_i^{(m)}-\mu_{R_i}\big). $$

In particular,

$$ |\delta_i| = \frac{|R_i|}{|A_i|}\,\big|\mu_i^{(m)}-\mu_{R_i}\big|. $$

Intuition. This is the cleanest "mechanism" linking minPts to center motion. Increasing minPts tends to strip low-density boundary/bridge points first. The formula says the center moves in the direction away from the removed points' mean, with a magnitude that scales like:

how much mass was removed ($|R_i|/|A_i|$), and
how far the removed mass sat from the old center ($|\mu_i^{(m)}-\mu_{R_i}|$).

So when minPts breaks a thin bridge between two dense blobs, $R_i$ often lies "between" modes; centers retreat inward, typically increasing inter-center separation—until cluster deaths start reducing the number of centers.

Proof. Write the old center using the disjoint union: $$ \mu_i^{(m)}=\mu_{A_i\cup R_i}=\frac{1}{|A_i|+|R_i|}\Big(\sum_{x\in A_i}x+\sum_{x\in R_i}x\Big) =\frac{|A_i|\mu_{A_i}+|R_i|\mu_{R_i}}{|A_i|+|R_i|}. $$

The new matched center is $\mu_{\pi(i)}^{(m+1)}=\mu_{A_i}$. Therefore

$$ \delta_i=\mu_{A_i}-\frac{|A_i|\mu_{A_i}+|R_i|\mu_{R_i}}{|A_i|+|R_i|} =\frac{|R_i|}{|A_i|+|R_i|}\,(\mu_{A_i}-\mu_{R_i}). $$

Now eliminate $\mu_{A_i}$ in favor of $\mu_i^{(m)}$ by rearranging the convex combination:

$$ \mu_i^{(m)}-\mu_{R_i}=\frac{|A_i|}{|A_i|+|R_i|}(\mu_{A_i}-\mu_{R_i}). $$

Substitute into the previous line to obtain

$$ \delta_i=\frac{|R_i|}{|A_i|}\,(\mu_i^{(m)}-\mu_{R_i}), $$

and taking norms yields the last identity. □

How the Two Theorems Fit Together: Grounding Intuition

At fixed $\varepsilon$, increasing minPts changes $S(m)$ through exactly the two channels in Theorem 1:

Matched-center motion is governed by $\sum|\delta_i|$. Theorem 2 makes $\delta_i$ concrete: it is driven by how much boundary mass is peeled and where that mass lies.
Birth/death topology events dominate when clusters split or vanish. In the matching framework, a split typically appears as one matched survivor plus additional "birth" centers; a vanishing appears as a "death." These events can change $S(m)$ even if the surviving centers barely move.

This gives a crisp diagnostic: if $S(m)$ jumps but the $\delta_i$ are small, the jump must be largely topological (births/deaths). If $S(m)$ drifts smoothly with no births/deaths, it's largely peeling-driven center motion.

Conclusion

We have presented a rigorous mathematical framework for understanding how DBSCAN's behavior evolves as the minPts parameter increases at fixed $\varepsilon$. The matching-based decomposition (Theorem 1) separates continuous center motion from discrete topological events, while the peeling formula (Theorem 2) quantifies the mechanism by which boundary erosion drives centroid shifts. Together, these results provide both theoretical insight and practical diagnostic tools for parameter selection and cluster stability analysis in density-based clustering.